Skip to content

feat: wg-easy add preflights #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 16, 2025
Merged

Conversation

nvanthao
Copy link
Member

@nvanthao nvanthao commented May 7, 2025

Add Preflight Checks to WireGuard Easy (wg-easy)

Overview

This PR adds preflight checks to the wg-easy chart.

Implementation Details

  • Each chart-specific preflight is defined as a Kubernetes Secret with label troubleshoot.sh/kind: preflight

  • Preflight specs are stored in each chart's templates directory:

    cert-manager/templates/secret-preflights.yaml
    wg-easy/templates/secret-preflights.yaml
  • Added new task helm-preflight to streamline verification

  • Updated container image to include the preflight binary

Current Status

  • The PR is based on In container only #55 and will be rebased once that PR is merged
  • The cert-manager preflight is currently a placeholder to demonstrate multiple preflight specs merging, not the final preflight content
  • The wg-easy preflight has been implemented with actual checks relevant to the chart

Testing

Locally verify preflight specs with:

helm template <chart-dir> | preflight - --dry-run

Next Steps

  • Finalize cert-manager preflights with actual checks
  • Consider adding preflights to additional charts

Copy link
Member

@chris-sanders chris-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to do a review and then I realized you've based this on another active development branch but targeted the PR against main.

The problem is that means I'm not reviewing both PR's and I can't tell what you've changed and what the other PR is chaning.

If you base a PR on top of another existing branch I think we need to review against that other PR until it's merged so that the review is only of your current changes not everything from both branches.

Copy link
Member

@chris-sanders chris-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just watching the asciinema and again I don't know for sure if this is feedback from your PR or the branch you're building on top of but I'll note what I see either way.

  1. You ran a command 'ag' which I'm guessing is an alias or shortcut you've setup yourself? If so you shouldn't do that in a demo I don't know what that command does, hence I can't understand what you're trying to show us by running it.
  2. You've exposed your REPLICATED_API_TOKEN please rotate it and be more careful in the future about recording sensitive values.
  3. The output here is far to verbose, that's part of the cause of item 2 above, slim it down we shouldn't be showing task commands like you show here. Especially when they have API TOKENS in them. That goes for both the green and the white text, it's noisy look at the Embedded Cluster CLI outputs for a target of what good looks like. I think just disable all of the verbose output would be a good start.
  4. As I mentioned in yesterdays' standup, you shouldn't need cluster-create and setup-kubeconfig in your dependencies. As far as I can tell just setup-kubeconfig is sufficient because it already includes creating a cluster. We need to be careful not to unnecessarily re-run and steps for no reason.
  5. This is the biggest one, you're using a bash loop to make this work which is fine for development purposes however how do you see this working in a real install? An end user won't be using Taskfile or this repository, the preflight needs to work through Enterprise Portal and Embedded Cluster installs. Are you addressing that next? I'm fine with this implementation for developers but before we land anything we need to know what this means for end users.

@nvanthao
Copy link
Member Author

nvanthao commented May 8, 2025

many thanks @chris-sanders

  1. I use ag as a drop-in replacement for grep as it is faster
  2. many thanks I have rotated the token for the demo
  3. I removed -v=5 flag
  4. I removed dep cluster-create
  5. In real install, let me do a quick record of what I am seeing and share the thoughts with us in a Loom

@nvanthao
Copy link
Member Author

nvanthao commented May 8, 2025

This is the biggest one, you're using a bash loop to make this work which is fine for development purposes however how do you see this working in a real install? An end user won't be using Taskfile or this repository, the preflight needs to work through Enterprise Portal and Embedded Cluster installs. Are you addressing that next? I'm fine with this implementation for developers but before we land anything we need to know what this means for end users.

I've shared the Loom in our Slack channel, tldr;

  • for KOTS install in existing cluster, the preflights will run during kots install
  • for EC install, the preflight will happen in KOTS Admin Console UI
  • for helm install, Vendor Portal contains the instructions to run helm template | preflight per chart
  • for helm install, Enterprise Portal is missing preflights run instructions as of now

@nvanthao nvanthao requested a review from chris-sanders May 8, 2025 04:13
Copy link
Member

@chris-sanders chris-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this is still targeting main and not the other PR.
I'm assuming all changes in this PR are yours since you asked for another review, is that accurate?

Comment on lines 1 to 28
apiVersion: v1
kind: Secret
metadata:
name: cert-manager-preflights
labels:
troubleshoot.sh/kind: preflight
type: Opaque
stringData:
preflight.yaml: |
apiVersion: troubleshoot.sh/v1beta2
kind: Preflight
metadata:
name: cert-manager-preflights
spec:
analyzers:
- distribution:
outcomes:
- pass:
when: "== gke"
message: GKE is a supported platform
- pass:
when: "== k3s"
message: K3S is a supported platform
- fail:
when: "== kind"
message: This application does not support Kind
- warn:
message: The Kubernetes platform is not validated, but there are no known compatibility issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use any 'dummy' values. Cert-manager has a list of prerequisites documented. Use real requirements.

@xavpaice xavpaice mentioned this pull request May 11, 2025
@nvanthao
Copy link
Member Author

I just realized this is still targeting main and not the other PR. I'm assuming all changes in this PR are yours since you asked for another review, is that accurate?

Hi @chris-sanders, I can't target the other PR branch as the branch is in a forked repo. As mentioned earlier, I will wait for that PR to be merged to main and rebase this PR.

I've updated the preflights for cert-manager accordingly. I can only find the k8s version as the only prerequisite here

@chris-sanders
Copy link
Member

I didn't realize the other PR was in a forked repo, I'll mention to the team to work out of this repo in the future.

@nvanthao nvanthao marked this pull request as draft May 13, 2025 01:35
@nvanthao nvanthao force-pushed the gerard/sc-122896/preflights branch 3 times, most recently from e7d1870 to 34e4f2d Compare May 13, 2025 03:46
@nvanthao nvanthao marked this pull request as ready for review May 13, 2025 03:55
Copy link
Member

@chris-sanders chris-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be sure to update the docs as well, that would also probably answer some of the questions I had in the review.

Comment on lines 178 to 193
cmds:
- |
TEMP_FILE=$(mktemp)

# Find all charts and append their templates to the temp file
for chart_dir in $(find charts/ -maxdepth 1 -mindepth 1 -type d); do
helm template "$chart_dir" >> "$TEMP_FILE"
echo "---" >> "$TEMP_FILE" # Add separator between charts
done

# Run preflight once on the combined templates
echo "Running preflight checks on all templates"
cat "$TEMP_FILE" | kubectl preflight - --dry-run

# Clean up
rm "$TEMP_FILE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help me understand the value that this provides.
What is the output of running this when:

  1. There's a failure
  2. It's successful

Is this meant to check formatting of the preflights?
If something is formatted poorly won't it be hard to debug since you concat them all into a temporary file? It seems like running this on each chart individually would be much more straight forward since you would know which chart failed and the line numbers on errors (does it provide those?) would line up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch on the failure, I separate the preflight call now to detect if there's failure.

@nvanthao nvanthao requested a review from chris-sanders May 14, 2025 10:43
Copy link
Member

@chris-sanders chris-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small suggestions and a topic to consider on using include or files.get for preflight definitions.

Comment on lines +178 to +188
vars:
DRY_RUN: '{{.DRY_RUN | default "false"}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vars:
DRY_RUN: '{{.DRY_RUN | default "false"}}'
vars:
DRY_RUN:
default: false

I can't imagine we actually want to support setting an ENV for DRY_RUN which is very specific to this one task and not something you likely want set permanently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think the variable is already scoped locally to the helm-preflight task only.

I committed the suggestion but it looks like it is not valid Taskfile syntax

err:  "default" is not a valid variable type. Try "sh", "ref", "map" or using a scalar value
file: /Users/gerard/dev/platform-examples/applications/wg-easy/Taskfile.yaml:180:9
  178 |     vars:
  179 |       DRY_RUN:
> 180 |         default: false
      |         ^
  181 |     cmds: 
  182 |       - |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so the syntax wasn't correct but it can still be done can't it?

vars:
  DRY_RUN: false

Or something like that?

I'm asking why you're explicitly making a dynamic variable when you don't need it to be dynamic. I think you're copying a pattern that we setup when we were setting global variables and importing environments but I don't think it is necessary here.

Maybe I'm just misunderstanding Taskfile and this isn't possible or the dynamic nature of using "{{.DRY_RUN}}" provides some value that I'm not aware of. Does including "{{.DRY_RUN}}" provide some value or is that the only way to set a variable?

Comment on lines 1 to 28
apiVersion: v1
kind: Secret
metadata:
name: cert-manager-preflights
labels:
troubleshoot.sh/kind: preflight
type: Opaque
stringData:
preflight.yaml: |
apiVersion: troubleshoot.sh/v1beta2
kind: Preflight
metadata:
name: cert-manager-preflights
spec:
analyzers:
# https://github.com/cert-manager/cert-manager/blob/master/deploy/charts/cert-manager/README.template.md#prerequisites
- clusterVersion:
outcomes:
- fail:
when: "< 1.22.0"
message: The application requires at least Kubernetes 1.22.0, and recommends 1.25.0.
uri: https://cert-manager.io/docs/installation/helm/#prerequisites
- warn:
when: "< 1.25.0"
message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.25.0 or later.
uri: https://cert-manager.io/docs/installation/helm/#prerequisites
- pass:
message: Your cluster meets the recommended and required versions of Kubernetes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using the pattern Chuck introduces in POVs to keep the preflight valid yaml.

apiVersion: v1
kind: Secret
metadata:
  name: cert-manager-preflights
  labels:
    troubleshoot.sh/kind: preflight
type: Opaque
stringData:
  preflight.yaml: |-
{{ .Files.Get "templates/preflight.yaml" | indent 4 }}

Then you would put: at applications/wg-easy/charts/cert-manager/templates/preflight.yaml

apiVersion: troubleshoot.sh/v1beta2
kind: Preflight
metadata:
  name: cert-manager-preflights
spec:
  analyzers:
    # https://github.com/cert-manager/cert-manager/blob/master/deploy/charts/cert-manager/README.template.md#prerequisites
    - clusterVersion:
        outcomes:
          - fail:
              when: "< 1.22.0"
              message: The application requires at least Kubernetes 1.22.0, and recommends 1.25.0.
              uri: https://cert-manager.io/docs/installation/helm/#prerequisites
          - warn:
              when: "< 1.25.0"
              message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.25.0 or later.
              uri: https://cert-manager.io/docs/installation/helm/#prerequisites
          - pass:
              message: Your cluster meets the recommended and required versions of Kubernetes.

Finally, I think you would need to in the .helmignore:

templates/preflight.yaml

This allows you to edit the preflight as an actually valid yaml file. If you needed to adjust the preflight based on helm values in the future you could just as easily use an include and change the preflight to a .tpl and it's fully helm temptable. I'm not sure if .Files.Get or include is better but at least in this example you don't actually need the templating so I just went for the Files.Get pattern.

This might even simplify the preflight checking you're doing with dry run because you can call the file directly without having to go through the helm chart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is great improvement using both Files.Get or include. I opted in for include now as from Helm docs

Some files cannot be accessed through the .Files object, usually for security reasons.
Files in templates/ cannot be accessed.
Files excluded using .helmignore cannot be accessed.

We will have to create new folder on same level with templates which would unnecessarily complicate our directory structure. Also include allows future preflight checks to access Helm values more easily.

@nvanthao nvanthao requested a review from chris-sanders May 15, 2025 08:27
Copy link
Member

@chris-sanders chris-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking for clarity on one last thing the rest looks good to me.

Comment on lines +178 to +188
vars:
DRY_RUN: '{{.DRY_RUN | default "false"}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so the syntax wasn't correct but it can still be done can't it?

vars:
  DRY_RUN: false

Or something like that?

I'm asking why you're explicitly making a dynamic variable when you don't need it to be dynamic. I think you're copying a pattern that we setup when we were setting global variables and importing environments but I don't think it is necessary here.

Maybe I'm just misunderstanding Taskfile and this isn't possible or the dynamic nature of using "{{.DRY_RUN}}" provides some value that I'm not aware of. Does including "{{.DRY_RUN}}" provide some value or is that the only way to set a variable?

nvanthao and others added 5 commits May 16, 2025 18:11
Co-authored-by: Chris Sanders <chriss@replicated.com>
Co-authored-by: Chris Sanders <chriss@replicated.com>
@nvanthao nvanthao force-pushed the gerard/sc-122896/preflights branch from c4b4189 to 9f15105 Compare May 16, 2025 08:11
@nvanthao
Copy link
Member Author

The dynamic variable is so that user can toggle between validation mode and execution mode of preflight without modifying the Taskfile.

  • task helm-preflight DRY_RUN=true - runs in validation mode, checking content of preflight spec to run without executing anything
  • task helm-preflight - runs the full preflight checks against the cluster.

@nvanthao nvanthao requested a review from chris-sanders May 16, 2025 08:25
Copy link
Member

@chris-sanders chris-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chris-sanders chris-sanders merged commit a4d7404 into main May 16, 2025
1 check passed
@chris-sanders chris-sanders deleted the gerard/sc-122896/preflights branch May 16, 2025 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants